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Background of the In vention 



The present invention relates to encoding and 



decoding apparatuses for transmitting a speech signal at 
a low bit rate and, more particularly, to a speech 
signal decoding method and apparatus for improving the 
10 quality of unvoiced speech. 



signal at low and middle bit rates with high efficiency, 
a speech signal is divided into a signal for a linear 
predictive filter and its driving sound source signal 

15 (sound source signal) . One of the typical methods is 
CELP (Code Excited Linear Prediction) . CELP obtains a 
synthesized speech signal (reconstructed signal) by 
driving a linear prediction filter having a linear 
prediction coefficient representing the frequency 

20 characteristics of input speech by an excitation signal 
given by the sum of a pitch signal representing the 
pitch period of speech and a sound source signal made up 
of a random number and a pulse. CELP is described in M. 
Schroeder et al., "Code-excited linear prediction: 

25 High-quality speech at very low bit rates", Proc. of 

IEEE Int. Conf- on Acoust - , Speech and Signal Processing, 
pp. 937 - 940, 1985 (reference 1) . 



As a popular method of encoding a speech 



- 1 - 




Mobile communications such as portable phones 
require high speech communication quality in noise 
environments represented by a crowded street of a city 
and a driving automobile. Speech coding based on the 
5 above-mentioned CELP suffers deterioration in the 

quality of speech (background noise speech) on which 
noise is superposed. To improve the encoding quality of 
background noise speech, the gain of a sound source 
signal is smoothed in the decoder. 

10 A method of smoothing the gain of a sound 

source signal is described in "Digital Cellular 
Telecommunication System; Adaptive Multi-Rate Speech 
Transcoding", ETSI Technical Report, GSM 06.90 version 
2.0.0, January 1999 (reference 2). 

15 Fig. 4 shows an example of a conventional 

speech signal decoding apparatus for improving the 
coding quality of background noise speech by smoothing 
the gain of a sound source signal. A bit stream is 
input at a period (frame) of Tf^. msec (e.g., 20 msec), 

20 and a reconstructed vector is calculated at a period 

(subframe) of Tf^-ZNgfj. msec (e.g., 5 msec) for an integer 
Nsfr (e.g., 4). The frame length is given by Lf^ samples 
(e.g., 320 samples), and the subframe length is given by 
Lgfr samples (e.g., 80 samples). These numbers of 

25 samples are determined by the sampling frequency (e.g., 
16 kHz) of an input signal. Each block will be 
described. 
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10 A code input circuit 1010 segments 
,„,.t ; ; ^^^^^^ .„put terminal 

r;r::r: ..... c.... t.. 1.0..=. 

r^mrality of decoding parameters, 
corresponding to a plurality 

^ r-ii-ruit 1010 outputs an index 
The code input circuit: lu ^^^-^na 

^ ^ Da-ir-'> representing 
^ LSP (Linear Spectrum Pair) rep 
corresponding to i-bf 

t-.ristics of the input signal to an 
the frequency characteristics o , 

. 1020 The circuit 1010 outputs an 

LSP decoding circuit 1020. 

^ 1 T representing ^^ne 
index corresponding to a delay rep 

Ircuit 1.10, ana an index corresponding to a sound 

ae UP o£ a random nuK*er and a pulse to 
source vector made up of a 

. sound source signal decoding circurt lUO 

. 1010 outputs an index corresponding to the frrst 

circuit 1010 ou p . if 1220 and an index 

gain to a first gain decoding circuit 1220. 

ndin, to the second gain to a second gain 
corresponding t;o v-nc 

decoding circuit 1120. 

The LSP decoding circuit 1020 has a table 

.^^r Tcpo The LSP 
= r,lnralitY of sets of LSFS. 
0 which stores a plurali ^^^^ 

decoding circuit 1020 receives the P 

. 1010 reads an LSP corresponding to 

code input circuit 1010, i,sPq<»."'(n) , 

the index fro., the table, and sets 

. , 1 ^ « in the «.„th suMra.e of the current fra.e 
^ ; , « ,3 a linear prediction order. TheLSPs 
» >-h frame,. s „_,,3ined b. 

cf the first to ,.«.«.(n -1). LSPq^'W, 

linearly interpolating q, W a" '^a 
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j = l,A,Np, m = l^A^Ngfj, are output to a linear 

prediction coefficient conversion circuit 1030 and 

smoothing coefficient calculation circuit 1310. 

The linear prediction coefficient conversion 
5 circuit 1030 receives LSPqj'"*(n), j = l,A,Np, m = l^A^Ng^^ 

output from the LSP decoding circuit 1020. The linear 

prediction coefficient conversion circuit 1030 converts 
the received q™(n) into a linear prediction coefficient 

oCj^'^n), j = l,A,Np, m = l,A,N3f^, and outputs df\n) to a 

10 synthesis filter 1040. Conversion of the LSP into the 
linear prediction coefficient can adopt a known method, 
e.g., a method described in Section 5.2.4 of reference 2. 

The sound source signal decoding circuit 1110 
has a table which stores a plurality of sound source 

15 vectors. The sound source signal decoding circuit 1110 
receives the index output from the code input circuit 
1010, reads a sound source vector corresponding to the 
index from the table, and outputs the vector to a second 
gain circuit 1130. 

20 The second gain decoding circuit 1120 has a 

table which stores a plurality of gains. The second 
gain decoding circuit 1120 receives the index output 
from the code input circuit 1010, reads a second gain 
corresponding to the index from the table, and outputs 

25 the second gain to a smoothing circuit 1320. 

The second gain circuit 1130 receives the 
first sound source vector output from the sound source 
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Signal decoding circuit 1110 and the second gain output 
from the smoothing circuit 1320, multiplies the first 
sound source vector and the second gain to decode a 
second sound source vector, and outputs the decoded 
5 second sound source vector to an adder 1050. 

A storage circuit 1240 receives and holds an 
excitation vector from the adder 1050. The storage 
circuit 1240 outputs an excitation vector which was 
input and has been held to the pitch signal decoding 

10 circuit 1210. 

The pitch signal decoding circuit 1210 
receives the past excitation vector held by the storage 
circuit 1240 and the index output from the code input 
circuit 1010. The index designates the delay Lp^. The 
pitch signal decoding circuit 1210 extracts a vector for 

samples corresponding to the vector length from the 
start point of the current frame to a past point by L^, 
samples in the past excitation vector. Then, the 
circuit 1210 decodes a first pitch signal (vector) . For 
L < L , , the circuit 1210 extracts a vector for L^, 
samples, and repetitively couples the extracted L,, 
samples to decode the first pitch vector having a vector 
length of L^.^ samples. The pitch signal decoding 
circuit 1210 outputs the first pitch vector to a first 

25 gain circuit 1230. 

The first gain decoding circuit 1220 has a 
table which stores a plurality of gains. The first gain 



15 



20 
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decoding circuit 1220 receives the index output from the 
code input circuit 1010, reads a first gain 
corresponding to the index, and outputs the first gain 
to the first gain circuit 1230. 
5 The first gain circuit 1230 receives the first 

pitch vector output from the pitch signal decoding 
circuit 1210 and the first gain output from the first 
gain decoding circuit 1220, multiplies the first pitch 
vector and the first gain to generate a second pitch 

10 vector, and outputs the generated second pitch vector to 
the adder 1050. 

The adder 1050 receives the second pitch 
vector output from the first gain circuit 1230 and the 
second sound source vector output from the second gain 

15 circuit 1130, adds them, and outputs the sum as an 

excitation vector to the synthesis filter 1040. 

The smoothing coefficient calculation circuit 
1310 receives LSPqj'^^n) output from the LSP decoding 

circuit 1020, and calculates an average LSPqoj(n) : 

2 0 qoj(n) = 0.84 • qoj(n - 1) + 0.16 • q^^^^^\n) 

The smoothing coefficient calculation circuit 
1310 calculates an LSP variation amount do(m) for each 
subframe m: 



|q (n) - ^'^\n)\ 



j=i qoj(^) 

25 The smoothing coefficient calculation circuit 1310 

calculates a smoothing coefficient kpCm) of the subframe 
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.^^(O.as, max(0,d.(n,)-0. 4)1/0. 25 

° V) i= a function using a smaller one o£ x 

where Ttiin(x,y} is d 

, is a function using a larger one of 

and Y, and max(x,y) is a rui 

. .-Ffirient calculation circuit 

. and y. The smoothing coefficient 

,310 outputs the smoothing coefficient ..ir.^ to 

smoothing circuit 1320. ^ 

The smoothing circuit 1320 receives the 

thing coeHiclent ..(m, output from the smoothing 
smoothing 

coefficient calculation circuit 

output from the second gain decoding circus 112. 

•1- 1-^90 calculates an average y 
smoothing circuit 1320 cai 

from a second gain g„(m) of the suo 
g„(m) aoC" - « 

The "second gain g,(m) is replaced by 

The smoothing circuit 1320 outputs the second 
gain g.(m, to the second gain circuit 1130. 

■ ^ ^r■^^+-pr 1040 receives the 
The synthesis filter lui^ 

0 excitation vector output from the adder lOSO and a^ ^^^^ 
Unear prediction coefficient i - l.A- p P 

.he linear prediction coefficient conversion circuit 
,030. The synthesis filter 1040 calculates a 
reconstructed vector by driving the synthesis filter 
3, .n Which the linear prediction ---"^ 

excitation vector. Then, the synthesis filter 
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1040 outputs the reconstructed vector from an output 
terminal 20. Letting a^, i = l,A,Np be the linear 
prediction coefficient, the transfer function 1/A(z) of 
the synthesis filter is given by 
5 1 / (A)z = 1 / (1 - ^ a^z^) 

i = l 

Fig. 5 shows the arrangement of a speech 
signal encoding apparatus in a conventional speech 
signal encoding/decoding apparatus. A first gain 
circuit 1230, second gain circuit 1130, adder 1050, and 
10 storage circuit 1240 are the same as the blocks 

described in the conventional speech signal decoding 
apparatus in Fig. 4, and a description thereof will be 
omitted. 

An input signal (input vector) generated by 
15 sampling a speech signal and combining a plurality of 
samples as one frame into one vector is input from an 
input terminal 30. A linear prediction coefficient 
calculation circuit 5510 receives the input vector from 
the input terminal 30. The linear prediction 
20 coefficient calculation circuit 5510 performs linear 
prediction analysis for the input vector to obtain a 
linear prediction coefficient. Linear prediction 
analysis is described in Chapter 8 "Linear Predictive 
Coding of Speech" of reference 4. 
^^5^^ The A^near prediction coefficient calculation 

[/J ^^^X circuit 5510 outtouts the linear prediction coefficient 
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to an l\p conversion/quantization circuit 5520, 
weightii\ filter 5050, and weighting synthesis filter 
5040. 

The LSP conversion/quantization circuit 5520 
receives the linear prediction coefficient output from 
the linear prediction coefficient calculation circuit 
5510, converts the linear prediction coefficient into 
LSP, and quantizes the LSP to attain the quantized LSP. 
conversion of the linear prediction coefficient into the 
LSP can adopt a known method, e.g., a method described 
in Section 5.2.4 of reference 2. 

Quantization of the LSP can adopt a method 
described in Section 5.2.5 of reference 2. As described 
in the LSP decoding circuit of Fig. 4 (prior art), the 
quantized LSP is the quantized LSPq'^-'^'Cn) , j = 1,A,N, in 
the N,,, subframe of the current frame (nth frame) . The 
quantized LSPs of the first to (N.,.-l)th subframes are 
obtained by linearly interpolating q'^^^^^^n) and 
^(Ksfr)(n - 1). The LSP is LSPq'^-^'^^n) , j = 1,A,N, in the 

subframe of the current frame (nth frame) . The LSPs 
of the first to (N3,.-l)th subframes are obtained by 
linearly interpolating q^^^^^^^n) and q<«=-\n - 1) . 

The LSP conversion/quantization circuit 5520 
outputs the LSPqf(n), j = l,A,Np, m = 1,A,N3,., and the 
quantized LSPq^^n), j = 1,A,N„ m = 1,A,N3.. to a linear 
prediction coefficient conversion circuit 5030, and an 
index corresponding to the quantized LSPqij-^^^^^n) , j = 
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The linear y • - i A Isl , m = 

. 3,30 receives the LSPq-^n), ^ ' - _ 

IAN., and the quantized LSPq, 0 . 3 

.H. LSP conversion/quantizatxon 
. 1 ^ N output from the . ^ 

5 l,A,N,j. ° ^ circuit 5030 converts q^ W 

c,S20. The circuiT- 
circuit 552U. ^ l,A,Np, m - 

^Hiction coefficient (n) , 3 
linear prediction prediction 

H a""'(n) into a quant izea i 
1,A,N,,„ and q, W a n . • The linear 

efficient af (n), j = 1,A.^^p' 
coefficiem: , rcuit 5030 outputs 

^^fnrient conversion circuii: 
10 prediction coefficie weighting 
K r.^-'(n) to the weighting filter 
t'^^ ^ ' ^ ^(n»(n) to the weighting 

V, ^« filter 5040, and (n) ^o 
synthesis fUte .^^^ 

• filter 5040. Conversion of the 

and conversion of the 
linear prediction -f-cien ^^^^^^^ 

■ jqp into the quantizea i 
15 quantized LSP im^" ^ , ^ a a method 

coe££ic.ent „£erence 2. 

.escibe. in Sect.cn 5. ^^^^^^^^ ^^^^^ 

The weighting filter 

1 -^n and the linear 

30 predicticn cceff.c.en ^^^^^^^^^ ^ 

cceHicient conversion c.rcu ^^^^^ 

. filter W(2) corresponding to the 
weighting filter «l ^rfiction coefficient. The 

weighting filter .„v.Mna filter 5050 

^ vector. The weighting rii 
25 a weighted mput 3ubtractor 5060. 

outputs the weighted input vector to ^^^^ 
..e transfer function .U) of the weighting 
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is given by M(^) .Q(-/v.)/0<-'"'>>- 

, , 1 - T a"Mz' and Q(z/-l'2> 
Note that <2(2/T,) - 1 Z."' 

. 1 - "±o.'t'y\-' v.and 7, are constants, e.g., v.- 

„ Tv - 0 6 Details of the weighting filter are 
0 . 9 and y 2 - ^ • ° 
described in reference 1. 

^ ^=■i^ter 5040 receives 

The weighting synthesis filter 

..e e.citatlon vector o.t.t fro. ^^^^^Z^ = 
the linear prediction coeffrc.ent a, (n) , : 

......... and the .anti.ed toTt^ 

a«(n), j - I.A.N,. - l-'^'"-" 

linear prediction coefficient conversion circnit S030. 

. „eighting synthesis ^^^^^^^^^ ^^LU .he 
)/[Alz)Q(z/T3)l having a, (n) and a, 

l.citation vector to obtain a weighted reconstructed 
, vector, .he transfer function H.z, - l/^C. of t 

3,„,.esis filter is given by IMU, = i/ U-S ^ > ■ 
The subtracter 5060 receives the weighted 
i„p.. vector output fro. the weighting filter 50.0 and 
the weighted reconstructed vector output fro„ the 

-o filter 5040, calculates their 
20 weighting synthesis filter , ^ 

. a difference vector to a 

difference, and outputs it as a dit 

minimizing circuit 5070. 

The minimizing circuit 5070 sequentially 
outputs all indices corresponding to sound source 
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•nrr^Pil neneration 
...cu.t S^.0 to .oun. sc.ce s^J ^^^^^^^^^^^^ 

.»0. - ..n....^ » ^^^^^^ , 

— " ^"^^rjrzrz. ....o. ..... 

range defined by a p 

..e s..na. .-.at.on » ^^^^^^^ 

circuit 5070 sequentially 
minimizing circui ^^^^^ gg3_n 

11 first gains stored m ^ 
corresponding to a generation 

^Hrcuit 6220 to the first y 
generation ci.cu to all second 

..rcuit 62.0. and indioe ^^^^^^ ^^^^^^^ ^^^^ 

0 ,aina sto.ed in a second .ai , 

second ,a.n .^-»tion ci- ^^^^^^^^^^^^ 

.ne .ini.i.in, cl. ^^^^^^^^^^ 

^•^^for^nce vectors outpur 
receives 3,,,.,. a sound source 

,OeO. caicuxates t.ei ^^^^^ ^^^^^^ ^^^^^ ^^^^ 

" ^^^^^ p,,,, ,i,nal 

the code output generation 

•4- «=.910, sound source ^ 
generation circuit 5210, 

Circuit 5110, nrst ,ain generation 

. ^^rcuit 6120 sequentially 
20 second gain generation cxrcui ^^.q. 

. V from the minimizing circuit 
indices output from circuit 5210, 

- signal ; 

aound source signal generation circu 

^990 and second gam y 
generation circuit .220, ^^^^^^ ^^^^^^^^ 

circuit 6120 are the sa.e J 

. ,<t 1210, sound source signal decod 
circuit 1210, 

first gain decoding circuit 1220, 
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..1120 in Fig. 4 except for input /output 
decoding circuit 1120 .n F.g 

a detailed description of these 
connections, and a derai 

will be omitted. receives an index 

The code output circuit 6010 

H'na to the quantized LSP output from the LSP 
> corresponding to 

conversion/quantization circuit 

. .r. the sound source vector, delay L^,, 
corresponding to the soun 

H nains that are output from the 
first and second gams th ..^cuit 6010 

smO The code output circuir 

0 converts these indices .nto a ,..t 

.^tputs it v.a an output "--J- 

The first problem is that soun 

V. is aenerated in short unvoiced 
TT-niced speech is geuej- 
normal voicea p . in the voiced speech or 

h intermittently contained m the 
speech interm discontinuous 
1^ oart of the voiced speech. Asa 

. nerated in the voiced speech. This is 
sound IS generated decreases in the 

because the LSP variation amount d^ (m) 

■ .d speech to increase the smoothing 
3hort unvoiced speec ^^^^ ^^^^^ ^^^^^ 

coefficient. Since d^ (m) greatly 

, .o a certain degree in part of 
. , . a. i^^rae value to a oei.^ 
20 exhibits a large coefficient does 

^v, Vint the smoothing coej-j-x 
the voiced speech, but rne 

not become 0 . smoothing 
The second problem is that tn 

nn unvoiced speech. As a 
.efficient abruptly changes in unvoi 
coefficient generated in the unvoiced 

,it discontinuous sound is genei 
25 result, coefficient is 

speech. This .3 because t ^^^^ 
determined using d. (m) which greatly 
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The third problem Is that proper smoothing 

^■r,„ to the type of background noise 
processing corresponding to the typ 

cannot be selected. As a result, the decoding .uality 
degrades. This Is because the decoding parameter is 

= finale algorithm using only 
5 smoothed based on a single axg 

different set parameters. 

It is an object of the present invention to 
p.ovide a speech signal decoding method and apparatus 
,0 for improving the guality of reconstructed speech 
against background noise speech. 

TO achieve the above object, according to the 
p.esent invention, there is provided a speech signal 

. • -HKo c:i-pns of decoding 
decoding method comprising the steps . , , 

information containing at least a sound source signal. 
,aln, and filter coefficients from a received bit stream, 
Identifying voiced speech and unvoiced speech of a 
.peech Signal using the decoded Information, performing 
smoothing processing based on the decoded information 
20 for at least either one of the decoded gain and the 

decoded filter coefficients In the unvoiced speech, and 
decoding the speech signal by driving a filter having 
,he decoded filter coefficients by an excitation signa 
Obtained by multiplying the decoded sound source signal 

■^r, result of the smoothing 
25 by the decoded gam using a result 

processing . 
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3,,.., .ecCin. appa.a.u3 acco.din. to t.e .i.st 
er^odiment of the present invention; 

Pig 2 is a blocK diagram showing a speech 
decoding apparatus according to the second 
eK^odiraent of the present invention; 

Fig 3 is a bloc, diagram showing a speec 

nn the present inventxon; 
signal encoding apparatus used .n the p 

Pig 4 is a block diagram showxng a 

. .1 speech signal decoding apparatus; and 
conventional speecn y ^ • a 

Pig. 5 is a block diagram showing a 

V, =innal encoding apparatus, 
conventional speech signal 

i^^^^^^^-*^^''^^'-^^-^ be described in 

The present invention will be des 

^ .to the accompanying drawings- 

detail below with reference to the a 

K . ^ speech signal decoding 
Fig. 1 shows a speeon 

' to the first embodiment of the 

apparatus according to the 
present invention. An input terminal 10, 

P • ,-11- 1020, linear 

■ 1 90 LSP decoding circuit lO^u, 

20 terminal 20, i^b circuit 1030, sound 
p„a.c..on coe«.c.en. — " ^^^^^^^ 

. 1230, second gain circuit 1130, 

25 synthesis fUte description 
described in the prior art of Fig. 

4= be omitted, 

thereof wiii- 
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. t circuit 1010, voiced/unvoiced 
^ code xnput c.rcu ,,,tion 
r-^TCuit 2020, noise classii 
identification circui ^^^^^ 

on first switching circuit 21iu, 
circuit 2030, fxrst ^^^^^^ ,,,^er 

r-ircuit 2210, first filter 
switching circuit ^.^^^ 

V,- H filter 2170, fourth fxlte 
2160, third filte decoding 

si.th ....... .120 .ill 

• ,^1- 2220, and second gam u 
circuit ^^^^1 

be described. ^ ^ ^ oeriod (frame) of 

A bit stream is input at P 

and a reconstructed vector is 
1^ n 20 msec) , ana c 
T,, msec (e.g., . t /Nsfr n^^ec (e.g., 

■ r^H (subframe) or ifr/^^sfr 
calculated at a period ^^^^^ ^^^^^^ 

subframe length is g determined by 

, , These numbers of samples are 
.5 samples). These ^5 KHz) of an input signal- 

the sampling frequency (e.g-, 

Each block will be described. ^^^^ 
Eacn w ^i^ruit 1010 segments 

The code input circuit 

™ input from an input terminal 10 
" a stre^ .npu ^^^^^^^^ ^^^^ ^^^^ ^^^^^^^ 

20 several segment. ^^^^^^^^ 

The code input circui circuit 1020. 

. fo LSP to the LSP decoding cir 
..„espona.. « ^^^^^ 3 

25 speech mode to a P ^ ^^^^^ p^„3^ 

^"^:.:r:rr::L, an ... ..e^pona.. a ... 

decoding circuxu 
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P.t=. s.,n.. aeco..n, circuit ...0 an n 
.Lk co..e.pondin. to a .ouna .ou.ce ve.tc. to t.e 

■ ,0.0 outputs an .n.e. co..espondin. to t.e .«.t ,a.n 

•4- 9790 and an index 
first gain decoding cxrcuxt 2220, 

the second gain to the second gam 
corresponding to the secoi y 

decoding circuit 2120. receives 
The speech mode decoding cxrcuxt 2050 

-^v^^ c-mppch mode that is 
the index corresponding to the speech 

-; +- 1010 and sets a 
, output fro. the code input c.rcu.t 1010 

3peeoh .oae S^. corresponding to the rnde.. The spe 

^ w +-v.r-*:.^hold processing for an 
xnode is determined by threshold P 

n fr,^ of an open-loop pitcn 
intra-frame average G,p(n) ot an p 

.in G (m) calculated using a perceptually 
nrediction gam ^op^^f "^^ , 

. . in a speech encoder. The speech 
5 weighted input signal in a spe 

.Ode is transmitted to the decoder. In this case, n 

„H m the subframe number, 
represents the frame number; and m, 

• .-on of the speech mode is described in K. 
Determination of tne bp , .^v, 

,1 "M-LCELP speech Coding at 4 kb/s with 
r>7;^wa et ax./ 

V" TFTCE Trans. On Commun. , 
20 Multi-Mode and Multi-Codebook , lEICE 

q OD 1114 - 1121, September 1994 
Vol. E77-B, NO. 9, PP- ^-^^ 

(reference 3). ^ 
The speech mode decoding circuit 

<^ to the voiced/unvoiced 
the speech mode S„ode ^° , .. 

... ,020 first gain decoding circuit 
25 identification circuit 2020, 

2220, and second gain decoding circuit 2120. 

The frame power decoding circuit 2040 has a 
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.a.le 20«a «.icn stores a plurality of fra.e energies. 
... .r^. power deco.in. circuit 2040 receives the rn.ex 

is output from the 
corresponding to the frame power that P 

■ ^ rircuit 1010, and reads a frame power E,,3 
code input circuit xvjxu, 

frame power Is attained by ,uantl.ln. the power of an 
input Signal In the speech encoder, and an index 
.orrespondln, to the .uanti.ed value is trans.itte 
..e decoder. The frame power decoding circurt 2040 
outputs the frame power K.. - the voiced/unvorced 

9070 first gain decoding circuit 
identification circuit 2020, 

2220, and second gain decoding circuit 2120. 

The voiced/unvoiced identification circuit 

• LSPa'^'Hn) output from the LSP decoding 

2020 receives Lbt'qj ^n; 

5 circuit 1020, the speech mode S.^. output from the 

speech mode decoding circuit 20S0, and the frame power , 
6 output from the frame power decoding circurt 2040. 

^rms ^ 

. of obtaining the variation amount of a 
The sequence of oorainj-ny 

^ =1 r.;,rameter will be explained. 
3pectral pa„me^ ^^^^^^^^ ^^^^^^^^^^ ,3 ,3,,. 

.n the nth frame, a long-term average ,,n, of the .S. is 

where 3 o ^ • ^ ' . ^ ^^v, 

^ variation amount d,(n) of the ISP in the nth 

25 

frame is defined by 
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^ .^^i qj(n) 

where Dq^j(n) corresponds to the distance between qj(n) and 
qj"^\n) . For example, 

&^^(n) = (qj(n) - qf\n) f 

5 or 

D'^^(n) = |qj(n) - qf (n)| 
In this case, Dlj'^j(i^) = |qj(n) - qj"'*(n)| is employed. 

A section where the variation amount dq(n) is 
large substantially corresponds to voiced speech, 

10 whereas a section where the variation amount dq(n) is 
small substantially corresponds to unvoiced speech. 
However, the variation amount dq(n) greatly varies over 
time, and the range of dq(n) in voiced speech and that 
in unvoiced speech overlap each other. Thus, a 

15 threshold for identifying voiced speech and unvoiced 

speech is difficult to set. 

For this reason, the long-term average of 

dq(n) is used to identify voiced speech and unvoiced 
speech. A long-term average dqi(n) of dq(n) is calculated 

20 using a linear or non-linear filter. As dq|(n), the 

average, median, or mode of dq(n) can be applied. In 

this case, 

dq,(n) = p, . d^,(n - 1) + (1 - p,) . dq(n) 

is used where jS ^ = 0,9. 
25 Threshold processing for dqi(n) determines an 

identification flag S^gZ 
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if (dqi(n) > Cthi) then S^^ = 1 
else = 0 

where Cthi is a given constant (e.g., 2.2), S^g = 1 
corresponds to voiced speech, and S^g = 0 corresponds to 
5 unvoiced speech. 

Even voiced speech may be mistaken for 
unvoiced speech in a section where steadiness is high 
because dq(n) is small. To avoid this, a section where 
the frame power and pitch prediction gain are large is 
10 regarded as voiced speech. For = 0, S^s is corrected 
by the following additional determination: 
if (E_ > C_ and S^,,, > 2) then S,^ = 1 

else = 0 

where C^^^ is a given constant (e.g., 10, 000), and S^^^^ > 
15 2 corresponds to an intra-frame average GQp(n) of 3.5 dB 

or more for the pitch prediction gain. 

This is defined by the encoder. 

The voiced/unvoiced identification circuit 

2020 outputs S^g to the noise classification circuit 2030, 

20 first switching circuit 2110, and second switching 
circuit 2210, and dqi(n) to the noise classification 

circuit 2030. 

The noise classification circuit 2030 receives 
dqi(n) and S^g that are output from the voiced/unvoiced 

25 identification circuit 2020. In unvoiced speech (noise), 
a value dq2(^) which reflects the average behavior of 
dq,(n) is obtained using a linear or non-linear filter. 
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For S,3 = 0, 

dg2(n) = p2 • d^2(n - 1) + (1 - P2) • dq,{n) 

is calculated for jS 2 = 0.94. 

Threshold processing for d^2^n) classifies 



5 noise to determine a classification flag S^^: 
if {dg2(n) > Cth2) thenS^, = 1 
elseS^, = 0 

where Cth2 is a given constant (e.g., 1.7), S^^^ = 1 
corresponds to noise whose frequency characteristics 
10 unsteadily change over time, and S^z = 0 corresponds to 
noise whose frequency characteristics steadily change 
over time. The noise classification circuit 2030 
outputs to the first and second switching circuits 
2110 and 2210. 

15 The first switching circuit 2110 receives 

LSPqj'^^n) output from the LSP decoding circuit 1020, the 

identification flag S^g output from the voiced/unvoiced 

identification circuit 2020, and the classification flag 

S^^ output from the noise classification circuit 2030. 

20 The first switching circuit 2110 is switched in 

accordance with the identification and classification 
flag values to output LSPq*j"'\n) to the first filter 2150 

for = 0 and - 0, to the second filter 2160 for S^^ 

= 0 and = 1, and to the third filter 2170 for 8^3 = 1 . 
25 The first filter 2150 receives LSPq!j^'(n) output 

from the first switching circuit 2110, smoothes it using 
a linear or non-linear filter, and outputs it as a first 
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smoothed LSPq|^'^*(n) to the linear prediction coefficient 

conversion circuit 1030- In this case, the first filter 
2150 uses a filter given by 

qrj(n) = Yi • q^T'^^^) + (1 - Yi) • (n) , j = 1, A, 
5 where q{|5(n) = qi\'^j^^^*(n - 1), and 7i = 0.5. 

The second filter 2160 receives LSPqj'^^Cn) 

output from the first switching circuit 2110, smoothes 

it using a linear or non-linear filter, and outputs it 
as a second smoothed LSPq2^](n) to the linear prediction 

10 coefficient conversion circuit 1030. In this case, the 
second filter 2160 uses a filter given by 

q<^](n) = Y2 • q2?:r''(^) + ' ^^2) * qfV) , j = 1. A, 
where q^j^^in) = q2^f^*(n - 1), and - 0.0. 

The third filter 2170 receives LSPq^j'^^Cn) output 

15 from the first switching circuit 2110, smoothes it using 

a linear or non-linear filter, and outputs it as a third 
smoothed LSPq3^j(n) to the linear prediction coefficient 

conversion circuit 1030. In this case, q3^j(n) = qj"^*(n) . 

The second switching circuit 2210 receives the 
20 second gain g2^^{n) output from the second gain decoding 

circuit 2120, the identification flag S^g output from the 
voiced/unvoiced identification circuit 2020, and the 
classification flag S^^^ output from the noise 
classification circuit 2030. The second switching 
25 circuit 2210 is switched in accordance with the 

identification and classification flag values to output 
the second gain gf\n) to the fourth filter 2250 for S^^ = 
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0 and = 0, to the fifth filter 2260 for S^g = 0 and 
= 1, and to the sixth filter 2270 for S^^ = 1 . 

The fourth filter 2250 receives the second 
gain g2^*(n) output from the second switching circuit 2210, 

5 smoothes it using a linear or non-linear filter, and 

outputs it as a first smoothed gain g2^/(n) to the second 

gain circuit 1130. In this case, the fourth filter 2250 
uses a filter given by 

g^i^^^) = Y2 • 927'' i^) + (1 - 72) • g*2"^*(n) 
10 where g2^i(n) = g2^i^^^*(n - 1), and = 0.9. 

The fifth filter 2260 receives the second gain 
g2^*(n) output from the second switching circuit 2210, 

smoothes it using a linear or non-linear filter, and 
outputs it as a second smoothed gain g2^2(n) to the second 

15 gain circuit 1130. In this case, the fifth filter 2260 
uses a filter given by 

gr2(n) = Y2 • gr2"^n) 4- (1 - • ^^(n) 
where g2%{n) = g2^2^''*(n - 1), and 72 = 0.9. 

The sixth filter 2270 receives the second gain 
20 g2^\n) output from the second switching circuit 2210, 

smoothes it using a linear or non-linear filter, and 
outputs it as a third smoothed gain g2^3(n) to the second 

gain circuit 1130. In this case, g2^3(n) = g2"\n) . 

The first gain decoding circuit 2220 has a 
25 table 2220a which stores a plurality of gains. The 
first gain decoding circuit 2220 receives an index 
corresponding to the third gain output from the code 
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input circuit 1010, the speech mode S.„,e output from the 
speech mode decoding circuit 2050, the frame power E_ 
output from the frame power decoding circuit 2040, the 
linear prediction coefficient ^(n), j = l,A,Np of the 
5 mth subframe of the nth frame output from the linear 
prediction coefficient conversion circuit 1030, and a 
pitch vector c..(i), i = l^A.^s. output from the pitch 
signal decoding circuit 1210. 

The first gain decoding circuit 2220 

1 (™)/r^\ -i = 1 A.N (to be simply 
10 calculates a k parameter (n) , D i,A,Mp i 

represented as k,) from the linear prediction 
coefficient dr(n). This is calculated by a known method, 
e.g., a method described in Section 8.3.2 in L.R. 
Rabiner et al., "Digital Processing of Speech Signals", 
15 Prentice-Hall, 1978 (reference 4). Then, the first gain 

decoding circuit 2220 calculates an estimated residual 

power Ej,es using k^ : 

E.., = 6..s^n?f,(i 

The first gain decoding circuit 2220 reads a 
20 third gain y,.. corresponding to the index from the 
table 2220a switched by the speech mode S^,^,, and 
calculates a first gain g^c = 

The first gain decoding circuit 2220 outputs 
25 the first gain g.. to the first gain circuit 1230. The 
second gain decoding circuit 2120 has a table 2120a 
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15 



20 



which stores a plurality of gains. 

The second gain decoding circuit 2120 receives 
an index corresponding to the fourth gain output from 
the code input circuit 1010, the speech mode S„.,e output 
from the speech mode decoding circuit 2050, the frame 
power E_ output from the frame power decoding circuit 
2040, the linear prediction coefficient df(n), j = 
1,A,N, of the mth subframe of the nth frame output from 
the linear prediction coefficient conversion circuit 
1030, and a sound source vector c,,(i), i = L^'^str 
output from the sound source signal decoding circuit 

1110. 

The second gain decoding circuit 2120 

-i =1 A,N (to be simply 
calculates a k parameter (n) , j i,A,LNp ^ 

represented as k,) from the linear prediction 
coefficient d-^n) . This is calculated by the same known 
n^ethod as described for the first gain decoding circuit 
2220. Then, the second gain decoding circuit 2120 
calculates an estimated residual power E,^ using k,: 

= E^s|n5.(i - k?) 

The second gain decoding circuit 2120 reads a fourth 
gain Y,ec corresponding to the index from the table 
2120a switched by the speech mode S.^.^. and calculates a 
second gain ofgc ' 



^res 



The second gain decoding circuit 2120 outputs 
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. - the second switching circuit 2210. 

the second gam g^c ^o the second 

Fig. 2 shows a speech signal decoding 
apparatus according to the second e.bodi.ent of the 

present invention. 

This speech signal decoding apparatus of the 
p.esent Invention Is Implemented replacing the frame 
power decoding circuit 2040 In the first embodiment wrth 
a power calculation circuit 3040, the speech mode 
decoding circuit 2050 with a speech mode determination 
, ...cult 3050. the first gain decoding circuit 2220 wrth 
a first gain decoding circuit 1220, and the second gam 
decoding circuit 2120 with second gain decoding clrcurt 
1,20 in this arrangement, the frame power and speech 
mode are not encoded and transmitted In the encoder, and 
the frame power .power, and speech mode are obtained 
using parameters used in the decoder. 

The first and second gain decoding circuits 
,220 and 1120 are the same as the blocKs described In 

f of Fid 4, and a description thereof will 

the prior art of Fxg. 

20 be omitted. 

The power calculation circuit 3040 receives a 

^constructed vector output from a synthesis filter 1040, 

calculates a power from the sum of squares of the 

reconstructed vectors, and outputs the power to a 

25 volced/unvolced identification circuit 2020. In this 

case, the power Is calculated for each subframe. 

calculation of the power in the mth subframe uses a 
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• 1 output from the synthesis filter 
reconstructed signal output ,„,tructed 
K l^th subframe. For a reconstructed 

1040 in the (m-l)th calculated 

. ^ _ i = 0,A,L.,., the power E.„. 

Signal Sgyn v-l^ f 

by, e.g., RMSjRootJlean Square): 



E 



'rms 



10 



,,3 '.p.ec. .ode aete^ination ci.cu.t 3050 

/ -; \ X = 0 , A , -L>mem ^ 

_..esapas. ^nde. o..p.t 

- ^ " 0. .^e ..... .es.,na.e= 

a delay Lpd- ^^^^ 

value of Lp,. prediction gain 

in the mth subframe, a pitcn p 

TAN is calculated from the past 

^•hr.r e (i) and delay Lpd- 
excitation vector e„en>v-u; 

G_™(m) = 10-log,o(g..e.("^)) 



1 5 ^emem 

where ^ 

■'■ " i^^(m)E^ 

E-i(m) = 2-1 ^mem^^^ 
i=0 



20 



i=0 

E,(m) = e,em(i)emem(i " ^P<^^ 

. Jpitch prediction gain G_(.a) or the 

r (n^ in the nth frame of 
intra-frame average G^^^m^n' 

t-hrpshold processing to set a 
undergoes the following threshol 
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# # 



speech mode S, 



'mode " 



if (G, 



emem 



(n) > 3.5) then S 



mode 



= 2 



else S 



mode 



= 0 



The speech mode determination circuit 3050 outputs the 



speech mode S, 



'mode 



to the voiced/unvoiced identification 



circuit 2020. 

Fig. 3 shows a speech signal encoding 
apparatus used in the present invention. 

The speech signal encoding apparatus in Fig. 3 

10 is implemented by adding a frame power calculation 

circuit 5540 and speech mode determination circuit 5550 
in the prior art of Fig. 5, replacing the first and 
second gain generation circuits 6220 and 6120 with first 
and second gain generation circuits 5220 and 5120, and 

15 replacing the code output circuit 6010 with a code 
output circuit 5010. The first and second gain 
generation circuits 5220 and 5120, an adder 1050, and a 
storage circuit 1240 are the same as the blocks 
described in the prior art of Fig. 5, and a description 

20 thereof will be omitted. 

The frame power calculation circuit 5540 has a 
table 5540a which stores a plurality of frame energies. 
, The frame power calculation circuit 5540 receives an 
input vector from an input terminal 30, calculates the 

25 RMS (Root Mean Square) of the input vector, and 

quantizes the RMS using the table to attain a quantized 
frame power Ej-ms • input vector 5^(1), i = 0,A,L3f^, 
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a power E^j^g is given by 

|Lsfr-I 

Eirms = J Z 
V i=0 

The frame power calculation circuit 5540 
outputs the quantized frame power £^.^^3 to the first and 

5 second gain generation circuits 5220 and 5120, and an 
index corresponding to E,^^^ to the code output circuit 

5010. 

The speech mode determination circuit 5550 

receives a weighted input vector output from a weighting 

10 filter 5050. 

The speech mode S^^^^ is determined by 

executing threshold processing for the intra-frame 
average G^^pCn) of an open-loop pitch prediction gain 

Gop(m) calculated using the weighted input vector. In 
15 this case, n represents the frame number; and m, the 
subframe number . 

In the mth subframe, the following two 
equations are calculated from a weighted input vector 
s„i(i) and the delay L^^^p, and IL^^^ which maximizes 
a2tmp -^^ obtained and set as L^pi 

Lgfr-l 

Esctmp = Z Swi(i)Swi{i - Ltmp) 

i=0 

Esa2tmp i^) = Z Swi(i " Ltmp) 
i=0 

From the weighted input vector s„i(i) and the 
delay L^p, the pitch prediction gain Gop(m), m = 1,A,N^^^ 
25 is calculated: 
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J op 

where 
where 



G„ (m) = 10-logio(gop(i^) > 



1 



1 - ^ — 



10 



E3a,(m) = 2L^«i^^^ 
i=0 

i=0 

E3e(«^) = £ S„i(i)S„i(l - Lop) 
i = 0 

■ r tm\ or the intra-frame 
The pitch prediction G.,(™) or 

average G.,(n) in the nth frame of G„,(n.) 

K nrocessing to set the speech mode 
following threshold processing 



S 



mode ' 



if (G„p(n) ^3.5) then S^^de ' ^ 



15 



20 



else S„ode - ° ^ , 

^y.^ sneech mode is described 
Determination of the speecn n 

K 0.awa et a... "M-^CE.P Speech Codin. at 4 .b/s 

«u.t.-Moae an. MuXti-Co.e.oo." . X.XCB T.ans On 

(reference 3) . 

The speech mode determination circuit 5550 

^ Q to the first and second 
outputs the speech mode S„„,e to 

,..n generation circuits 5220 and 5i20. and an .ndex 

^ . q to the code output 

A-r.r. Tn the speech mode S„ode 
corresponding to T:ne 

circuit 5010. 

A pitch signal generation circuit 5210, a 
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15 



30.n. source si.nal .ene.at.on circuit 5X10. ana t.e 

ana seco„a.ain generation circuit. 52.0 ana 5i20 
3e,ue„tiailv receive indices output .ro. a — 
Circuit 50,0. The pitch signal generation circurt 5210, 
3ouna source si.nai generation circuit 51X0, .irst .am 
generation circuit 5220. ana second .ain ,eneratron 

circuit X2X0. souna source signal aecoaing circurt XXIO. 

-5 4- 9990 and second gain 
first gain aecoaing circuit 2220. an 

aecoaing circuit 2120 in .ig. 1 except .or input/outpu 
connections, ana a aetailea aescription o. these .loc.s 

will be omitted. 

The code output circuit 5010 receives an index 

corresponding to the quantized .SP output fro. the LSP 
conversion/quantization circuit 5520. an index 
corresponding to the quantized fra.e power output fro. 
.he .ra.e power calculation circuit 5540. an index 

„ode determination circuit 5550. and indices 

■ „it 5070 The code output circuit 5010 
minimizing circuit 5070. 

■ 4-^ v^it- stream code, ana 
converts these indices into a bit 

outputs it via an output terminal 40. 

The arrangement of a speech signal encoding 
apparatus in a speech signal encoding/decoding apparatus 
according to the fourth embodiment of the present 



20 



25 
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invention is the same as that of the speech signal 
encoding apparatus in the conventional speech signal 
encoding/decoding apparatus, and a description thereof 

will be omitted. 
; in the above-described en±>odiments. the 

long-term average of d„(m) varies over time more 
gradually than d.,m,, and does not intermittently 
decrease in voiced speech. If the smoothing coefficient 
i. determined in accordance with this average, 
0 discontinuous sound generated in short unvoiced speech 

intermittently contained in voiced speech can be reduced. 

■H^r^t-ification of voiced or unvoiced 
By performing identif icarion 

.peech using the average, the smoothing coefficient of 
the decoding parameter can be completely set to 0 in 

15 voiced speech. 

Also for unvoiced speech, using the long-term 

average of d,(m, can prevent the smoothing coefficient 

from abruptly changing. 

The present invention smoothes the decoding 
20 parameter in unvoiced speech not by using single 

processing, but by selectively using a plurality of 
processing methods prepared in consideration of the 
Characteristics of an input signal. These methods 
include moving average processing of calculating the 
25 decoding parameter from past decoding parameters withrn 
a limited section, auto-regressive processing capable of 
considering long-term past influence, and non-linear 
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processing of limiting a preset value by an upper or 
lower limit after average calculation. 

According to the first effect of the present 
invention, sound different from normal voiced speech 
5 that is generated in short unvoiced speech 

intermittently contained in voiced speech or part of the 
voiced speech can be reduced to reduce discontinuous 
sound in the voiced speech. This is because the 
long-term average of do(m) which hardly varies over time 

10 is used in the short unvoiced speech, and because voiced 
speech and unvoiced speech are identified and the 
smoothing coefficient is set to 0 in the voiced speech. 

According to the second effect of the present 
invention, abrupt changes in smoothing coefficient in 

15 unvoiced speech are reduced to reduce discontinuous 
sound in the unvoiced speech. This is because the 
smoothing coefficient is determined using the long-term 
average of do(m) which hardly varies over time. 

According to the third effect of the present 

20 invention, smoothing processing can be selected in 

accordance with the type of background noise to improve 
the decoding quality. This is because the decoding 
parameter is smoothed selectively using a plurality of 
processing methods in accordance with the 

25 characteristics of an input signal. 




